Skip to content

March 2026 perf improvements#235

Open
struct wants to merge 1 commit intomasterfrom
march_2026_perf_improvements
Open

March 2026 perf improvements#235
struct wants to merge 1 commit intomasterfrom
march_2026_perf_improvements

Conversation

@struct
Copy link
Owner

@struct struct commented Mar 9, 2026

Performance fixes from Claude!

iso_alloc_zone_t field reordering — include/iso_alloc_ds.h
The biggest win. is_full was previously at offset ~2,119 bytes (cache line 33), buried after the 2,040-byte free_bit_slots[255] array. Every call to is_zone_usable() — the first check in the hot allocation path — would miss the cache loading that field.

The new layout puts all hot fields (user_pages_start, bitmap_start, next_free_bit_slot, canary_secret, pointer_mask, max_bitmap_idx, chunk_size, free_bit_slots_usable, free_bit_slots_index, is_full, internal) in the first 64 bytes (one cache line). The large free_bit_slots[255] array, which is only accessed during free-list refills, moves to the end.

__builtin_ctzll in iso_scan_zone_free_slot_slow — src/iso_alloc.c
Replaced all inner for(j = 0; j < 64; j += 2) loops with:

uint64_t free_mask = ~(uint64_t)bts & USED_BIT_VECTOR;
if (free_mask) return (offset + __builtin_ctzll(free_mask));
USED_BIT_VECTOR = 0x5555... selects even-position bits (one per chunk). Inverting + ANDing gives a mask of free slots. CTZ finds the first in one instruction instead of 32 iterations. Applied to all three paths: NEON, __int128 (split into two 64-bit halves), and standard.

__builtin_ctzll in fill_free_bit_slots — src/iso_alloc.c
Same technique for populating the free-list cache, replacing the 32-iteration inner loop in the partial-word case with a free_mask &= free_mask - 1 iteration (classic "iterate over set bits" idiom).

Zone cache scan direction — src/iso_alloc.c
Changed the thread zone-cache scan from oldest-first (0 → count-1) to newest-first (count-1 → 0). The cache is populated LIFO, so the most recently used zone — most likely to still have free slots — is found on the first iteration instead of the last.

comments

comments

clang format

more clang format
@struct struct force-pushed the march_2026_perf_improvements branch from 9b91ec6 to 5fd315c Compare March 9, 2026 12:25
@jvoisin
Copy link
Contributor

jvoisin commented Mar 9, 2026

Are there any benchmarks to validate those claims?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants